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Abstract 

In this paper, a convergence proof for the recently proposed cost function optimization sparse 
possibilistic c-means (SPCM) algorithm is provided. Specifically, it is shown that the algorithm 
will converge to one of the local minima of its associated cost function. It is also shown that 
similar convergence results can be derived for the well-known possibilistic c-means (PCM) algorithm 
proposed in jS), if we view it as a special case of SPCM. Note that the convergence results for 
PCM are stronger than those established in previous works. 

Index Terms 

Possibilistic clustering, sparsity, convergence, sparse possibilistic c-means (SPCM) 


I. Introduction 

In most of the well-known elustering algorithms that deal with the identifieation of eompaet 
and hyperellipsoidally shaped elusters, eaeh eluster is represented by a veetor ealled cluster 
representative that lie in the same feature spaee with the data veetors. In order to identify the 
underlying elustering strueture, sueh algorithms gradually move the representatives from their 
initial (usually randomly seleeted) loeations towards the “eenter” of eaeh eluster. Apart from 
hard elustering philosophy, where eaeh data veetor belongs exelusively to a single eluster (e.g. 
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k-means mi) and fuzzy clustering philosophy, where eaeh data veetor is shared among the 
elusters (e.g. fuzzy c-means (FCM) [13, Ol), an alternative well-known elustering philosophy 
that has been developed, in order to deal with this ease, is the possibilistie elustering one, 
where the degree of compatibility of a data veetor with a given eluster is independent of its 
degrees of eompatibility with any other eluster. Algorithms of this kind, known as possibilistie 
e-means algorithms (PCMs), iteratively optimize suitably defined eost funetions (e.g. [[41, ttSl, 
0 , na, m, mx aiming at moving the eluster representatives to regions that are dense in 
data points. A very well-known PCM algorithm, introdueed in ||4l and noted as PCMi, is 
derived from the minimization of the eost funetion 

N m m N 

jpcmau.q) = ( 1 ) 

2=1 j = l j = l 2 = 1 

while an alternative PCM algorithm, presented in 0 and noted as PCM 2 , is derived from 
the minimization of the eost funetion 

N m m N 

JpcM2{U, 0) = 5151 + 515I(“o (2) 

2=1 j = l j = l 2 = 1 

where Xj, i = 1,..., N denotes the Ah out of N /-dimensional data points of the data set 
X under study, 0fs,j = 1,... ,m denote the representatives of the m elusters (eaeh one 
denoted by Cj), whieh eonstitute the set 0. U is the matrix, whose {i,j) element Uij stands 
for the degree of compatibility of the Ah data veetor Xj with the jth representative 6j. Finally, 
7 /s are positive parameters, eaeh one assoeiated with a eluster Cj Q 

Convergenee results of these algorithms have been presented, utilizing the Zangwill eon- 
vergenee theorem [fTOll . It is shown that the iterative sequenee generated by a PCM eonverges 
to either (a) a loeal minimizer or a saddle point of the eost funetion assoeiated with the 
algorithm or (b) any of its eonvergent subsequenees eonverges to either a loeal minimizer or 
a saddle point of the eost funetion ffTTI . It is noteworthy that Zangwill’s theorem fITOl has 
been used to establish eonvergenee properties for the FCM algorithm as well (e.g. [O, ITT^ . 

imfl 

Reeently, a novel possibilistie elustering algorithm, ealled Sparse Possibilistie C-Means 

'Note that, in contrast to Jpcm 2 , Jpcmi involves an additional parameter q, which takes values around 2. 

different approach for proving the convergence of the FCM to a stationary point of the corresponding cost function 
is given in ca. A relative work is also provided in (m. 
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(SPCM) [HH, has been proposed, which extends PCM 2 by introducing sparsity. More specif¬ 
ically, a suitable sparsity constraint is imposed on the vectors containing the degrees of 
compatibility of the data points with the clusters (one vector per poinj^, such that each data 
vector is compatible with only a few or even none clusters. In the present work, an analysis of 
the convergence properties of SPCM algorithm is conducted and it is shown that the iterative 
sequence generated by SPCM converges to a local minimum of its associated cost function 
JspcM, which is defined explicitly in the next section. A significant source of difficulties 
in the convergence analysis of SPCM is the addition of an extra term in the cost function 
JpcM 2 ^ as explained in the next section, that is responsible for sparsity imposition, which 
gives the main novelty of SPCM. This affects the updating of the degrees of compatibility, 
which now are not given in closed form and they are computed via a two-branch expression. 

Moreover, it is shown that the above convergence analysis for SPCM is directly applicable 
to the PCM 2 algorithm ([Q) and the obtained convergence results are much stronger than 
those provided in lim . 

The rest of the paper is organized as follows. In Section II, a brief description of the SPCM 
algorithm is given for reasons of thoroughness and in Section III its convergence proof is 
analyzed. In Section IV the convergence results from the previous section are applied for the 
case of PCM 2 . Finally, Section V concludes the paper. 

II. The Sparse PCM (SPCM) algorithm 

Let X = {xj e 7l\ i = 1, ..., N} be the data set under study, 0 = {6j e = 1,..., m} 
be a set of m vectors that will be used for the representation of the clusters formed in 
X {cluster representatives) and U = [uij\,i = be an x m matrix 

whose (i,j) element stands for the degree of compatibility of Xj with the jth cluster. Let 
also = [wji, be the (row) vector containing the elements of the Ah row of U. In 

what follows we consider only Euclidean norms, denoted by || ■ ||. 

As it has been stated earlier, the strategy of a possibilistic algorithm is to move the vectors 
Ofs towards regions that are dense in data points of X (clusters). The aim of SPCM is two¬ 
fold: (a) to retain the sparser clusters, provided of course that at least one representative has 

^Clearly, these vectors are the rows of the matrix U. 
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been initially plaeed in each one of them and (b) to prevent noisy points from contributing to 
the computation of any of the 6/s. This is achieved by suppressing the contribution of data 
points that are distant from a representative 0j in its updating. More specifically, focusing on 
a specific representative Oj, this can be achieved by setting Uij = 0 for data points Xj that are 
distant from it. This is tantamount to imposing sparsity on Uj, i.e., forcing the corresponding 
data point Xj to contribute only to its (currently) closest representatives. To this end, the cost 
function Jpcm 2 of oq- @ is augmented as follows. 


jspcm{u, 0 ) = 

i=i 


- N 

N 


N 


Y,Uij\\-^i-ej 

? +liY.(Uij\YlUij 

Uij) 

+ xYl Ilu*llp5 Uij > 0 

4 

.i=l 

i=l 


i=l 



T3) 


where ||uj||p is the fp-norm of vector Uj (p G (0,1)); thus, ||uj 11^ = ET =1 Each 7 j 
indicates the degree of “influence” of Cj around its representative Op, the smaller (greater) 
the value of 7 j, the smaller (greater) the influence of cluster Cj around 6j. The last term 
in eq. 0 is expected to induce sparsity on each one of the vectors u* and A (> 0) is 
a regularization parameter that controls the degree of the imposed sparsity. The algorithm 
resulting by the minimization of Jspcm{U, 0) is called sparse possibilistic c-means (SPCM) 
clustering algorithm and it is briefly discussed below (its detailed presentation is given in 

GH). 


A. Initialization in SPCM 

First, the initialization of 0/s is carried out using the final cluster representatives obtained 
from the FCM algorithm, when the latter is executed with m clusters on X. 

After the initialization of 0/s, we initiali z e y/s as follows: 

Y-N FCM 


= 






N ^,FCM 


V V 

2^i=i “ 


j = 1 ,... ,m 


( 4 ) 




where 0/s and s in eq. (|^ are the final parameter estimates obtained by FCM. 

Finally, we select the parameter A as follows: 


X = K^ 


7 


p(l — p)C‘~P^ 


( 5 ) 


''This is a prerequisite in order for the \nuij to be well-defined. However, in the sequel, when refering to hiUij for 
Uii = 0, we mean lim Ui,. Also, we use the fact that lim up Inup = 0. 

np^0 + 
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where 7 = min 7 . and is a user-defined eonstant, whieh is set equal to K = 0.9 for 
p = 0.5 (see also flU). The rationale behind this ehoiee is further enlightened in subseetion 


III-A where, in addition, appropriate bounds on the values of K are given in terms of p. 


B. Updating of 6j’s and Uij’s in SPCM 

Minimizing Jspcm{U,Q) with respeet to 6j leads to the following equation, 

TV 

Ei=i Uijyii 


0,= 


The derivative of Jspcm with respeet to Uij is f{uij) = dij + 7 ^ In-Ujj + \pvf~^, where 
da = ||x,; — OjW^. In IIT^ it is proved that (a) /(u^) is strietly positive outside [0, 1], (b) 


( 6 ) 




f{uij) has a unique minimum at = [^p{l — p)] and (e) f{uij) = 0 has at most 


two solutions. More speeifieally, if f{uij) < 0 , then f{uij) = 0 has exaetly two solutions 


ujp, e (0, 1 ), with the largest of whieh eorresponds to a loeal minimum of 


,{ 1 } 


,{2} 








Jspcm with respeet to Uij. In ifT^ it is shown that Jspcm {U, 0) exhibits its global minimum 


at u*j, where: 


^7 


if f{uij) < 0 and ulf > (- 


0 , 


otherwise 


( 7 ) 


Clearly, if f{uij) = 0 has no solutions, then f{uij) will be positive for all valid values of u 




(see Fig. [^. Thus Jspcm will be strietly inereasing and it will be minimized at 0. Thus, 
we set u*j = 0. Note that the right-most inequality in the first braneh of eq. 0 turns out to 
be equivalent to JspcM{6j,u\f) < JspcM{6j,0) = 0 , where JspcM{6j,Uij) contains the 
terms of Jspcm{U,Q) that involve only Oj and Uij ((HU). All the above possible eases are 
depleted in Fig. 

To determine we solve /(u^) = 0 as follows. First, we determine and eheek 
whether f{uij) > 0. If this is the ease, then f{uij) has no roots in [0,1]. Note that, in this 
ease, it is /(m^) > 0 for all Uij G (0,1], sinee f{uij) > 0 (see Fig. le). Thus, Jspcm is 


inereasing with respeet to Uij in (0,1] (see Fig. Id). Consequently, in this ease we set u* = 0, 


^In its original version, the second inequality in the first branch was strict. Here, we change it to “less than or equal 
to”. Although this slight modification has no implications to the behavior of the algorithm in practice, it turns out to be 
important for the establishment of the theoretical results given below. 
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{a) f{uij) {b)J{uij) 




{c) f{uij) {(i)J{uij) 




(e) f{uij) 


(f) J{Uij) 


Fig. 1: In all plots the dashed parts of the graphs correspond to the interval (0,Umm), which is not 
accessible by the algorithm (see eq. 0). (a) The shape of function when /(«„) < 0 and the 

right-most condition of eq. Q is satisfied and (b) the corresponding shape of the cost function J(uy). 
(c) The shape of function when > 0 and (d) the corresponding shape of J{uij). (e) The 

shape of function f{uij), when /(u^) < 0 and the right-most condition of eq. 0 is not satisfied and (f) 
the corresponding shape of J{uij). 
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imposing sparsity. In the rare ease, where f{uij) = 0, we set u*j = 0, as Uij is the unique 
root of f{uij) = 0 and /(m^) > 0 for Uij e (0,*^) U (m^-, 1]. If f{uij) < 0, then /(m^) = 0 
has exaetly two solutions that both lie in [0,1] (see Figs. ^ In order to determine the 
largest of the solutions we apply the biseetion method (see e.g. IfTTII ) in the range 

{uij, 1], as is greater than Uij. The bisection method is known to converge very rapidly 
to the optimum Uij, that is, in our case, to the largest of the two solutions of f{uij) = 0. If 
the obtained solution satisfies the rightmost condition in the first branch of eq. Q, then 


we set M* = (see Fig. lb), as is shown in [|T^ . Otherwise, m* is set to 0 (see Fig. 


If) 


A vital observation is that, as long as Uij is given by the first branch of eq. Q, its values 
are bounded as follows 


u 


< Uij < 


( 8 ) 


where is obtained by solving the equation f{uij) = 0 , for dij = 0 ; that is the equation 
hiUij + Xpu\~^ = 0. Note that both and depend exclusively on A, 7 j and p. 
Before we proceed, we will give an alternative expression for eq. Q. which will be 
extensively exploited in the convergence proof below. More specifically, we will express 
the condition of the first branch of Q in terms of 6j. To this end, we consider the case 
where = m™"'. This implies that = 0 or /(m™”) = 0. Substituting m™"' by its 

equal given in eq. ([7]) and after some straightforward algebraic manipulations, it follows that 
/(m™”) = 0 is equivalent to 



/■ 

7j 

1— p 





V 7i 



( 9 ) 


The above is the equation of a hypersphere, denoted by Cij, centered at Xj and having radius 
Rj (note that Rj depends exclusively on the parameters 7 ^, p, A and not on the data points 
Xj or on 0j’s and Wj/s). Clearly, its interior intiCij) (which in the subsequent analysis is 
assumed to contain Cij itself) contains all the positions of 6j which give Uij > 0 , while all 
the points in its exterior ext (Cij) corresponds to positions of 6j that give Uij = 0. In order 
to ensure that Cij is properly defined, we should ensure that Rj is positive. This holds true 
if K is chosen so that K < (see Proposition A1 in Appendix). In the light of the 
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above result, eq. Q ean be rewritten as follows 


'^ij 


u-j % if I |xj - 0 


hj 

0 , 


3\\ — 

Otherwise 


^<R] 


( 10 ) 


Note that the expressions for u*j given by eqs. Q and (10) are equivalent and will be used 
interehangeably in the subsequent analysis. 


C. The SPCM algorithm 

Taking into aeeount the previous short deseription of its main features, the SPCM algorithm 
is summarized as follows. 


Algorithm 1 [0, T, U] = SPCM(X, m) 


Input: X, m 

1: f = 0 

> Initialization of Bj’s part 
2 : Initialize: 6j{t) via FCM algorithm 

> Initialization of'-fj’s part 

-, j = 

4: Set: A = K-pr-\ 2 =z, where 7 = min 7,- 

5: repeat 

> Update U part 

6: Update U if) via eq. 0. as described in the text 

> Update 0 part 

N / N 

7: Bjf + 1) = E / E Uij{t) , j = 1, ..., m 


3. set. 7j ^FCM 

^i=l ij 


2 = 1 


2=1 


8 : t = t + l 

9: until the ehange in Bfs between two sueeessive iterations beeomes suffieiently small 
10 : return 0, T = { 71 ,... , 7 ^}, U 


It is noted that after the termination of the algorithm an additional step is required, in 
order to identify and remove possibly duplieated clusters. 

The worst case computational complexity of (the main body of) SPCM is 0{{e + 2)Nm ■ 
iter), where e is the number of iterations in the bisection method (which have very light 
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computational complexitjQ and iter is the number of iterations performed by the algorithm. 
Note, however, that the actual complexity is mueh less since at eaeh iteration the biseetion 
method is activated only for a small fraction of Ui/s. As it is shown experimentally in [fT^ 
the computational complexity of SPCM is slightly inereased compared to that of PCM. This 
is the price to pay for the better quality results of SPCM compared to PCM. 


III. Convergence prooe of the SPCM 

In the sequel, a proof of the convergence of the SPCM is provided. Note that, in prineiple, 
the proof holds for any ehoice of (fixed) 7 /s, not only for the one given in eq. Q. 

Before we proceed, we note that the cost function associated with SPCM (eq. Q) ean be 
recasted as 


Jspcm{U,Q) — ^ dj) = X! 

i=i i=i 


h{uij,0j) 


N 

E 

i=l 


^ij 11 


N 

f + 7i E(“o “o “ “o) + -^“7 


2=1 


( 11 ) 


where Uj = [uij,... ,UNj]'^- Sinee (a) Ui/s, j = 1, ..., m, are not interrelated to eaeh other, 
for a speeifie Xj, (b) Ui/s, i = 1,..., N are related exelusively with Oj and vice versa and 
(c) Oj’s are not interrelated to each other, minimization of Jspcm{U, ©) can be considered 
as the minimization of m independent eost functions Jj’s, j = 1,..., m. Thus, in the sequel, 
we foeus on the minimization of a specific 6j) and, for the ease of notation, we drop 

the index j, i.e., when we write J(u, 6), u = [ui, ... ,unY , we refer to a Tj(uj, 6j). 

The proof is given under the very mild assumption that for eaeh one cluster at least one 
equation /(«*) = 0, z = 1,..., has two solutions at eaeh iteration of SPCM {Assumption 
1). This is a rational assumption, since if this does not hold at a eertain iteration, the algorithm 


cannot identify new loeations for 6 at the next iteration. In subsection III-A, it is shown how 
this assumption can always be fulfilled. 

Some definitions are now in order. Let M. be the set containing all the x 1 vectors u 
whose elements lie in the union {0} U [m™", i.e. M. = ({0} U Also, 

let be the space where the veetor 0 lives. The SPCM algorithm produces a sequence 
(u 7 ), whieh will be examined in terms of its convergence properties. 


®In our case e is fixed to 30, which implies an accuracy of 10 
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Let 


G:M^n\ with G(u) = e 


where G is ealeulated via the following equation 


TV 

Ei=i Uji^i 

Ehui 


6 = 


( 12 ) 


and 


F : 7?.' -)■ M, with F{0) = u 


where F is ealeulated via eq. (l^l. Then, the SPCM operator T : A7 x —>■ A7 x FMs 

defined as 

T = T2 oTi (13) 


where 

Ti: M xTZ^ ^ M, Ti(u, 6) = F{0) (14) 

and 

T 2 : Mxn\ T 2 (u) = (u, G(u)) (15) 

For operator T we have that 

T(u, 6 /) = (T 2 oTi)(u, 6 /) =T 2 (Ti(u, 6 I)) =T 2 (F( 6 I)) = 

(F(0),G(F(0))) = (F(0),(GoF)(0)) 

Thus, the iteration of SPCM ean be expressed in terms of T as 

(uW, 0W) = T(u(*-^), = (F(0(*-^)), (G o F)(6/('-^))) 

The above deeomposition of T to Ti and T 2 will faeilitate the subsequent eonvergenee 
analysis, sinee eertain properties for T ean be proved relying on Ti and T 2 (and, ultimately, 
on F and G). 

Remark 1: Note that F (and as a eonsequenee Ti) are, in general, not continuous (actually 
they are piecewise continuous). 
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In the sequel some required definitions are given. Let Z : X —>■ X (X C TZ^) be a point- 
to-point map that gives rise to an iterative algorithm z{t) = Z{z{t — 1)), whieh generates 
a sequenee z{t)\^Q, for a given 2 ;( 0 ). A fixed point z* of Z is a point for whieh Z{z*) = 
z*. Also, we say that Z is strictly monotonic with respect to a (continuous) function g if 
g{Z{z)) < g{z), whenever z is not a fixed point of Z. Having said the above, we can now 
state the following theorem that will be proved useful in the sequel: 

Theorem 1 HIM .• Let Z : X —)■ X (X G W) be a point-to-point map that gives rise to 
an iterative algorithm z{t) = Z{z{t — 1)), which generates a sequence z{t)\^Q, for a given 
; 2 ( 0 ). Supposing that: 

(i) Z is strictly monotonic with respect to a continuous function g : X ^ IZ, 

(ii) Z is continuous on X, 

(iii) the set of all points z{t)\'^Q is bounded and 

(iv) the number of fixed points having any given value of g is finite 

then 

the algorithm corresponding to Z will converge to a fixed point of Z regardless where it 
is initialized in X B 

In the SPCM case, Z is the mapping T (SPCM operator) defined by eq. ( [T3j ) and g is the 
cost function J. Due to the fact that SPCM has been resulted from the minimization of J, 
it turns out that its fixed points (u*, 0 *) satisfy VJ|(u, 0 ) = 0 . 

Although the general strategy to prove convergence for an algorithm is to show that it fulfills 
the requirements of the convergence theorem, this cannot be adopted in this straightforward 
manner in this framework. The reason is that Theorem 1 requires continuity of T, which is 
not guaranteed in the SPCM case due to T 2 (F) (see eq. ([T^), which is not continuous in 
its domain (which is the convex hull of X, CiT(X))j^ However, it is continuous on certain 
subsets of CH{X). This fact will allow the use of Theorem 1 for certain small regions where 
continuity is preserved. 

^This is a direct combination of Theorem 3.1 and Corollary 3.2 in (H. 

^Actually, this theorem has been stated for the more general case where Z is a one-to-many mapping mi. The present 
form of the theorem is for the special case where Z is a one-to-one mapping, which is the case for SPCM. 

®Due to its updating (eq. j^), 0 will always lie in CH(X), provided that its initial position lies in at least one hypersphere 
of radius R centered at a data point. 
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Some additional definitions are now in order. Without loss of generality, let / = 
that is I is the (nonempty) interseetion of the interiors of the hyperspheres of radius R (eq. 
(|^) that eorrespond to Xj’s, i = 1,... ,k (see Fig. |2]{^ Note that for 6 e I the above k 
points will have Ui > 0. The set of all data points that have m, > 0 form the so-ealled 
active set, while the points themselves are called active points. In addition, an active set Xq 
is called valid if its corresponding intersection of hyperspheres Ig is nonempty. Finally, the 
points with Wj = 0 are called inactive. 




(a) (b) 

Fig. 2: An active set of fc = 3 points in cases when (a) 0/ c / and (b) Qi I 

Let also 

Ui = {u = [ui,... ,Uk] : u = F{0), for 0 e 1} (16) 

be the set containing all possible values of the degrees of compatibility, Ui, of 0 with the k 
active Xj’s. Clearly, m/s are computed via the first branch of eq. and F is continuous 
in this specific case (as it will be explicitly shown later). Also, let 

ej = {0:0 = G'(u), for u G Uj} (17) 

(see Fig. for the possible scenarios for 0/). Three observations are now in order: 

*°Clearly, by reordering the data points we can take all the possible corresponding I intersections. 
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• First, due to the faet that Ui’s are independent from eaeh other, Uj can also be expressed 


as 



(18) 


where 11 denotes the Cartesian product and is the maximum possible value Ui can 
take, provided that 6 E I (clearly 

• If at a certain iteration t of SPCM, 0{t) G I, 0/ contains all possible positions of 


e{t + i). 


• Qj always lies in the convex hull of the associated active set. 

In the sequel, we proceed by showing the following facts, that are preliminary for the 
establishment of the final convergence result. Specifically, we will show that 

• (A) J(u, 0) decreases at each iteration of the SPCM operator T 

• (B) T is continuous on every region Uj x I that corresponds to a valid active set. 

• (C) The sequence produced by the algorithm is bounded 

• (D) The fixed points corresponding to a certain valid active set (if they exist) are strict 
local minima of J and they are finite. 

1) Proof of item (A): To achieve this goal, we prove first the following two lemmas 

Lemma 1: Let f : A4 ^ TZ, 0(u) = J(u,0), where 0 is fixed. Then u* is the global 
minimum solution of f if and only if u* = F(0), where F is defined as in eq. Q. 

Proof: We proceed by showing that 

(a) the unique point u* that satisfies the KKT conditions for the minimization problem 


min 0 (u) 

subject to Mj > 0 , i = 1 ,..., 
and 1 — Mi > 0, i = 1,..., N 


(19) 


is the one determined by eq. (|7]) and 

(b) this point is a minimizer of J, which implies (due to the uniqueness) that it is the global 
minimizer. 

Let u* = [u*] be a point that satisfies the KKT conditions for ([T^. Then we have 


if) u* > 0, iii) 1 - M* > 0 


( 20 ) 
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(i) 3 Kj > 0 : KiU* = 0, {a) 3 Tj > 0 : rj(l — u*) =0 


( 21 ) 


and 


dCju) 

dui 


|u=u* 0 


where /)(u) is the Lagrangian funetion defined as 


N 


N 


£(u) = (/)(u) - KiUi - Y TiiX - “*) 


2=1 


2 = 1 


Reealling eq. ©■ 0(u) can be written as 


N 


h{ui\6) 


0(u) = ^ [tiillxj - 0||^ + ^{uilwui - Ui) + \ul\ 


2 = 1 


( 22 ) 


(23) 


(24) 


where h{ui]6) is a function of Ui for a fixed value of 6. Noting that all m^’s are computed 
independently from each other, for fixed 0, it is easy to verify that, for a specific Ui it is 

d(j){u) dh{ui]0) 


dui 


dui 


= ||xj - + ylntij + Aptif ^ = f{ui) 


As a consequence, eq. (22) gives 


|xj — + ylnti* + XpuY ^ — Kj + Tj = 0 


(25) 


We will prove next that k* = 0 and Xj = 0, for i = 1,..., A^; that is, the constraints on Wj’s 
are inactive, i.e., the optimum of 0(u) lies always in the region defined by the constraints. 
Assume, on the contrary, that there exists > 0. From eq. (|2T[(i)) it follows that = 0 and 
from eq. (pT]-(ii)) that = 0. Taking into account that lim„*_>o+ (Tlnw^ + ^P'K = +°o 
[]]and applying eq. (25) for u* we have 


|Xs — 01 r + CXD = Ko or Kg = +00 


(26) 


which contradicts the fact that Kg is finite. 

Assume next that there exists > 0. From eq. (|2T[(ii)) it follows that m* = 1 and from 


"utilization of the L’ Hospital rule gives that lim^_,Q+ ^Ino; = 0 (p < 1). Then lim^^o+(liix + P ^i-p ) — 
lim 2 ,^Q+ -—= + 00 , for /3 > 0. Setting x = u^, j3 = ^, the claim follows. 
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eq. ([2T]-(i)), it is Ks = 0 . Applying eq. ([25]) for m* and substituting the above we have 


Xs — 0|| + 7 In 1 + ApF + Ts = 0 or rs = —||xs —0|| — Ap < 0 (27) 


which contradicts the fact that > 0. Thus Tj, = 0. 


Since k* = r* = 0, for all i, eq. (251 becomes 


|xj — + 7 lnM* + Apu- ^ ^ = /(«*) = 0, i = l, 




(28) 


Note that the algorithm relies on eq. ( [28l ) in order to derive the updating formula of eq. o 
(thus step (a) has been shown). We proceed now to show that the point corresponding to 
eq. 0 (derived through eq. ([28])) minimizes J. We consider the following two cases: 

• u* is given by the first branch of eq. (0). This implies that /(uj) = 0 has two solutions 

and (up^ < u\^^) and (= «"**"■) (figures la, Id). Taking into 

account the definition of h{ui]6) in eq. ( [24| ), it can be shown (Proposition 5, ffT^ ) that the 
maximum of the two solutions < u\^^) is the one that minimizes h{ui]6) 

and, as a consequence, 0 (u) also (which equals to J(u, 0 )) with 6 fixed. 

• u* is given by the second branch of eq. 0. In this case we have that either (i) f{ui) 
is strictly positive, which implies that J(u, 0 ) is strictly increasing with respect to Ui (case 

roi 

shown in figures lb, le) or (ii) h{u} ,6) > h{0,6) = 0 (case shown in figures Ic, If). In 
both (i) and (ii) cases, J(u, 0 ) is minimized with respect to Ui only for Ui = 0 (the second 
branch of eq. 0 ). 

From the above, it follows that u* is the global minimum solution of 0 if and only if u* 
is given by eq. 0. Q.E.D. 

Lemma 2: Let 'ifj : TZ’- ^ TZ, with 'ip{0) = J(u,0), with u G Uj being fixed. Then, 0* 
(G 0/) is the unique global minimum of ^|J if and only if 6* = G(u), where G is calculated 
as in eq. 0 - 

Proof: In contrast to the situation in Lemma I, the minimization of ^{6) with respect to 
0 is an unconstrained optimization problem. The stationary points of 'f{6) are obtained as 
the solutions of the equations 


df d 


■ N 

^ (willxj - 0 |p + 7 (Mi In Mi - Ui) + Awf) 

.7 = 1 


N 


= 25]m,(0-x,) = 0 , (29) 


i=l 
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which, after some manipulations, give 


0 * = 


TV 

S»=l Uj^jj 

Ehu, 


(30) 


Also, it is 


_d\. 
* ^ de^ 


h 

N 


= 2Y.uJ^ 


(31) 


i=l 


where is the I x I identity matrix. Under Assumption 1, stating that at least one Ui is 
eomputed by the first braneh of eq. Q, it is 6 > 0. Therefore, ^ is a eonvex funetion over 


TZ, with a unique stationary point, given by eq. (30), whieh is the unique global minimum 
of V’(6>)- Q.E.D. 

Combining now the previous two lemmas, we are in a position to prove the following 
lemma. 

Lemma 3: Consider a valid aetive set, whose eorresponding hyperspheres interseetion is 
denoted by I. Let 


S = {(u, 6) = {[ui,. .. ,Uk],0) ^ Ui X I : VJ|(u, 0 ) = 0 with Ui being the 

largest of the two solutions of feiui) = 0 , i = 1,... ,k} □ (32) 

Then J is eontinuous over Uj x I and 

J(T(u,6>)) < J(u,0), if (u,0) ^ 5 

Proof: Sinee {y —)■ |||/|p}, [y —)■ In?/}, {y —)■ y'^} are eontinuous and J is a sum of 
produets of sueh funetions, it follows that J is eontinuous on Uj x I. Let (u, 6) ^ S. 
Reealling that 

T(u,0) = {F{6),{GoF){6)) = {F{6),G{F{6))) 

we have 

J{T{n,e)) = J{{F{e),G{F{e)))) (33) 

'^In the sequel, we insert 6 as subscript in the notation of / in order to show explicitly the dependence of m from 6. 
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Applying Lemma 1 for fixed 0, we have that F{0) is the unique global minimizer of J. 
Thus, 

j{F{e),e)<j{u,e) (34) 


Applying Lemma 2 for fixed F{0), we have that G{F{0)) is the unique global minimizer 
of J. Thus, it is 

J(F(0),G(F(0)))< J(F(0),0) (35) 


From eqs. (33), (34) and (35), it follows that 


J(T(u, 61)) < J(u, 61), for (u, 0)^3 


Q.E.D. 

Remark 2: It is noted that although the above proof has been focused on the k (active) 
points, its generalization that takes also into account the rest data points is straightforward 
since Ui = 0, for i = k + 1, ... and the corresponding terms h{ui, 6) that contribute to J 
are 0. 

Remark 3: Taking into account that SPCM has been resulted from the minimization of J 
(VJ|(u,6») = 0) on a (7/ X J corresponding to an active set, it follows that S contains all the 
fixed points of T, which (as will be shown later) are local minima of the cost function J (of 
course, J may have additional local minima than those belong to S which are not accessible 
by the algorithm). 

Now we proceed by showing that T decreases J, in the whole domain ({0 }U[m™’^, 

CH{X). 

Lemma 4: The strict monotonically decreasing property of T with respect to J remains 
valid in the domain ({0} U [m™", x CH{X) excluding the fixed points of T of each 

valid active set. 

Proof: Let (u, 6) be the outcome of SPCM at a specific iteration, n = F{6) be the u for 
the next iteration and 6 = G(u) be the subsequent 0. Recall that the ordering of the updating 
is 

u —)■ 0 —u —)■ 0 (36) 
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We define 


and 


r = {z : Mj is eomputed via the seeond braneh of eq. Q} 


f = {z : zzj is computed via the second branch of eq. ([^} 


Recalling that h{ui] 0) = zzi||xj — + 7 (zzj Inzz* — zzj) + Azzf, we can write 

Ai A2 A3 

j(u,0)= h{ui]6)+ Y Y 

iefnf ie ' rnf ie ' f 


and 


^2 


As 


(37) 


J{u,0)= Y h{ui'^^)+ H h{ui;0) (38) 

iefnf is ■ rnf ie ' f 

where ~r denotes the complement of F. 

Focusing on Ai and Ai, we have that h{ui; 6) = h{ui] 6) = 0, since z G F n F. Thus 

Ai=Ai=0 (39) 


Considering A 2 and A 2 , since z G f, we have zz* = 0. Thus, taking into account the order 


of updating (eq. (36)) and Lemma 1, we have (0 =) h{ui]6) < h{ui]6). Thus, it follows 
that 

A 2 < A 2 (40) 


Finally, focusing on ^3 and A 3 , since z G f, the argumentation of Lemma 1 implies that 

_ rol 

the global minimum of h{ui] 0) is met at zzj = zz) . Thus, taking also into account the order 


of updating in eq. (36), it is h{ui]0) < h{ui]0). Therefore, it is 


A3 < A3 


(41) 


Combining eqs. ( [39l ), ( |40l ) and ( |4T] ) it follows that 


J(u, 0) < J(u, 0) 


(42) 
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Also, lemma 2 gives 

j(u,e)< j(u,e) 0 


Combining eqs. (42), (43), we have that 


J(u, 0) < J(u, 0) 


(43) 


Q.E.D. 

2) Proof of item (B): In the sequel, we give two useful Propositions concerning the 
continuity of the F and G mappings. In both Propositions, without loss of generality, we 
consider a valid active set, having Xj,i = as active points, whose corresponding 

hypersphere intersection is denoted by / and Uj, Qj are defined via eqs. 

Proposition 1: The mapping G is continuous on Ui x 

Proof: To prove that G is continuous in the N variables Ui, note that (7 is a vector field 
with the resolution by (/) scalar fields, written as 

G={Gi,...,Gi)-Uix{t)}^-^ 


where Gq : Uj x {0}" ^ ^ TZ is defined as: 


N 

Gq{u) = yff^^eq, q = 
Ui 


(44) 


Since {ui —)■ MjXj} is a continuous function and the sum of continuous functions is also 
continuous, Gq is also continuous as the quotient of two continuous functions. Under the 


assumption that J2f=iUi > 0, the denominator in eq. (44) never vanishes. Thus, Gq is well- 
defined in all cases and it is also continuous. Therefore, G is continuous in its entire domain. 
Q.E.D. 

Proposition 2: The mapping F is continuous over /. 


'^Considering a valid active set with corresponding hypersphere intersection I and 0/ defined as in eqs. l |16^ , (17| l, it is 
noted that although 9 £ I, this does not necessarily hold for 6, as Fig. 2h indicates, since 9 € 0/, with 0/ I. 


April 20, 2017 


DRAFT 







20 


Proof: It suffices to show that F is continuous on the I variables 6*^. F is a vector field 
with the resolution by (iV) scalar fields, i.e., 


where Fq is given by eq. 0 - 

The mapping {6 — )■ ||xj — 0|p(= di)} is continuous. Let us focus on the Wj’s, i = 1,... ,k, 
for which int{Ci) contributes to the formation of /; that is, on ufs given by the first branch 
of ( [I^ . The mapping {di —)■ Ui} is continuous. To see this, note that (since 7 is constant), the 
graph of f{ui) (which is continuous), viewed as a function of di, is simply shifted upwards 
or downwards as di varies (see fig. [^. Focusing on the rightmost point, where the graph 
intersects the horizontal axis, it is clear that small variations of di cause small variations to 
, which implies the continuity of {di —)■ Ui] in this case. 

Let us focus next on the ufs, i = fc + 1,..., iV, for which int{Ci) do not contribute to the 
formation of /; in this case Ui is given by the second branch of ( [TO] ) and the claim follows 
trivially. Q.E.D. 



Fig. 3: Graphical presentation of the continuity of the mapping -)■ m^}. Small 
variations in dij cause small variations in Uij. 


As a direct consequence of Propositions 1 and 2, we have the following lemma. 

Lemma 5: T is continuous on Ui x /. 

Proof: Recall that T = T 2 oTi and T 2 and Ti are defined in terms of G and F, respectively 


April 20, 2017 


DRAFT 






21 


(eqs. ([U]), ([T5])). G is continuous on Ui, as a consequence of Proposition 1, while F is 
eontinuous on / from Proposition 2. Thus, T is eontinuous on Uj x I as eomposition of two 
eontinuous funetions. Q.E.D. 

3) Proof of item (C): We proeeed now to prove that the sequenee produeed 

by the SPCM falls in a bounded set. 

Lemma 6: Let 0^°^) be the starting point of the iteration with the SPCM operator 

T, with e CH{X) and Then 

e [0, 1]^ X CH{X) 


Proof: For a given G CH{X), = 
(|^ and the argumentation in lHH). Also, 0^^'^ 
be reeasted as 


e [ 0 , 1 ]^, sinee G [ 0 , 1 ] (see eq. 


( 0 ) 


= is eomputed by eq. ( 12 ), whieh ean 


N 


em = ^ 


U. 


(0) 


— 1 Z-/2=l ^2 


Sinee ui G [0,1], it easily follows that 0 < ^ < 1 and Yjf=i — m = Thus 

0^^'^ G CH{X). Continuing reeursively we have ^ [g, 1]^ by eq. (j^ and 

= G{ g GH{X), using the same argumentation as above. Thus, induetively, we 
eonelude that 

(uO),0O)) = G [0, 1]^ X GH{X) 


Q.E.D. 

Remark 4: Note that it is possible to have outside GH{X), yet in a position where 
at least one Ui is positive. However, eomputing by eq. (|^, the latter will 

lie in Xi and, as a eonsequenee, 0 ^) = will lie in GH{X) as it follows by the 

argumentation given in the proof of Lemma 5. 


4) Proof of item (D): In the sequel, we will prove that the elements of the set S (eq. 
for a given valid aetive set with hyperspheres interseetion I (if they exist) are striet loeal 
minima of the eost funetion J and thus the eardinality of S is finite. 
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The elements of S are the solutions z* = (u*, 0*) = {ul,... , 6 *;*)[^of VJ|(u, 6 ») = 

0 with u* being the largest of the two solutions of feiui) = 0, i = 1, ..., fc. They should 
satisfy the following equations 

k 

= q = (45) 

i=l 

and 

||xj — 0*11^ + 7 lnM* + Xpuf ^ = 0, i = 1,..., k (46) 


Then, we have the following lemma. 


Lemma 7: The points z* that satisfy eqs. (451 and ( |46| ) (if they exist) are striet loeal minima 
of J in the domain Uj x I. Moreover, their number is finite. 

Proof: 

In order to prove that z* are local minima we need to prove that the Hessian matrix of J 
computed at z*, H^,*, is positive definite over a small region around z*. It is 


i/z* = 


where 


9i 

0 


0 

9*2 


0 0 
2 ( 0 *-xn) 2{ei-X2i) 
2 ( 9 *^- Xu ) 2(0*-0:22) 


0 

0 

9*k 


2(0*-xii) 2(0*-a;i2) 

2(0*-a;2i) 2(0*-0:22) 


2{9l-Xki) 2{9*-Xk2) 
2{9l-x,f 2Eti«: 

2 (0^ - Xk2) 


0 


0 


2 ELi«: 


2 ( 0 ;-xiO 2 ( 0 ; -X 2 z) ... 2{9l-xu) 


9*i =lK " - Ap(l - p)uf \ i = l,...,k 


2{9*i-xii) 

2{9*i-Xki) 

0 

0 

2 Eti< 

(47) 

(48) 


Let z! = (u', O') = {u'^,... 0'^, ... ,9'f) be a point in Uj x I that is close to z*. More 

'^Without loss of generality, we assume that the Xi’s, i = 1,... ,k are the active points of the valid active set under 
study. 
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specifically, let u[,, u'j. be close to ul,... ,ul, respectively, so that 


\ 6 * - 


Ei=i I 


< £ 


After some straightforward algebraic operations it follows that 


(49) 




(50) 


2 = 1 


2=1 


2 = 1 


It is easy to verify that Eti m'6 /''^(6 »*-x^) = Eti 

Z^i=l “i 

Utilizing the fact that Ui > m™" = ( ^0-p) ^i/(i-p)^ i = 1,..., fc, for the second appearance 


of u* in the right hand side of (48), it turns out that 9i> 


(1-P)7 


Combining the last two inequalities with eq. (50), it follows that 


^ u? 


z'^H^*z > 2^w*|| 0'||2 -4^M'||0'||e + (1 -p)7^ ^ = 0(||0' 


*=i n 


(51) 


2=1 2=1 

Since J2i=iU* > O, the second degree polynomial ())(||0'||) becomes positive if and only if 
its discriminant 

JL a a ?/'2 

(52) 


A = 8[2.^(E<)^-(1-P)7E<E“''^ 


2=1 


i=l i=l uj 


is negative. But, from Proposition A2 in Appendix, it is 


fc ^>2 


(E<)^<E<E 3 f 


2 = 1 


*=i *=i “I 


Also, choosing £ < 2^0 -p)7 ^ ^ negative. As a consequence and due to the 

continuity of -J in Uj x I, e defines a region around z*, for which z'^H.^*z' > 0. Thus z* is 
a strict local minimum. 

In addition, since the domain Uj x I is bounded, it easily follows that the number of strict 
local minima is finite. Q.E.D. 

Remark 5: It can be shown that in the specific case where (a) ^ and (b) K in 

eq. (5) is chosen in the range then the set Sg (eq. d^) that corresponds 

to each valid active set Xg has one element at the most. The proof of this fact follows the 
line of proof of lemma 7, with the difference that £ in eqs. ( |49l ), ( [5T] ) and (52) is replaced by 
R (since the maximum possible distance between two points in the (nonempty) intersection 
of hyperspheres of distance R, is equal to R). Then, the conditions (a) and (b) above follow 
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from the requirement to have 2R^ < (1 — p) 7 , in order to have negative diseriminant A. 
Utilizing eq. (5) in the previous requirement it follows that K > . Taking into 


2(l-p) 


aeeount that K < pe 
have 


(Proposition Al), eondition (a) results from the requirement to 


In the sequel we denote by Tz* a region around a point z* in the set Sq eorresponding to 
a valid aetive set Xq, where J is eonvex. will be ealled as a valley around z* (sueh a 
region always exists, as shown in proposition A3). 

Having eompleted the proof of the prerequisites (A)-(D) and before we proeeed any further, 
some remarks are in order. 

Remark 6: Although J is well defined in [0,1]^ x TZ\ there are several regions in the 
landseape of J(u, 0) that are not aeeessible by the algorithm. For example, some positions 
(u,0) where Ui < and those where 6 is expressed through eq. ([^ with eoeffieients Ui 
less that m™*”, are not aeeessible by the algorithm. 

Remark 7: It is highlighted again the faet that a eertain set of aetive points Xq, with 


eorresponding (nonempty) union of hyperspheres Iq and Uj , 0r as defined by eqs. (16) 


and <0, respeetively, may have no loeal minima of J in Uj^ x Iq that are aeeessible by 
T. Equivalently, this means that the solution set Sq (see Lemma 3) eorresponding to Xq is 
empty. 




(a) (b) 

Fig. 4: (a) An active set of = 3 points where (/ n (Dj; ui=oext{Ci))) ^ I and (b) an 
active set of /c = 4 points where (J n (n^: ui=oext{Ci))) = I 
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We prove next the following lemma. 

Lemma 8: There exists at least one valid aetive set Xq (with Iq ^ 0 ) for whieh there exists 
at least one loeal minimum (u* , 0* ), with 0 G 4 n (Hi: ui=oext{Ci)) 

Proof: Suppose on the eontrary that for all possible aetive sets Xq, there is no loeal 
minimum with 0*^ G Jg n (nf ext{Ci)) (see fig. |^. Equivalently, this means 

that the solution sets Sq for all valid aetive sets are empty. Then from lemma 3 we have that 
if at a eertain iteration ti, 6{ti) belongs to the interseetion Iq of a eertain aetive set Xq, the 
algorithm may move 6(t) (t > ti) to other positions in Iq that always strietly deerease the 
value of J. Sinee J is bounded below (due to the faet that u G [0, 1]^ and 6 G CH{X)) it 
follows that 6 will leave Iq at a eertain iteration. In addition, lemma 4 seeures the deerease 
of the value of J as we move from one hypersphere interseetion to another (or, equivalently, 
from one aetive set to another). Thus, the algorithm will always move (u(f), 6{t)) from one 
position to another in the domain [0, 1]^ x CH{X), without eonverging to any one of them, 
while, at the same time the value of J deereases from iteration to iteration. 

Assuming that at a speeifie iteration t', 0{t') belongs to a eertain Iq, then, due to the 
eontinuity of J in Iq, there exists a region V{t') around {u{t'),0{t')), for whieh J(u,0) > 
J{u(t' + 1), 0(t' + 1)), for (u, 0) G V(t'). 

From the previous argumentation, it follows that, sinee the domain where (u(t),0{t)) 
moves is bounded, the regions V (t) (defined as above) will eover the regions of the whole 
domain that are aeeessible by T. Thus there exists an iteration t" at whieh the algorithm will 
visit a point in the region V{t'), where t' is a position the algorithm visited before (f < t"). 
Then, due to the striet deerease of J as SPCM evolves we have that J{u{t”),0{t”)) < 
J{u{f + l),0{t' + 1)) < J{u{t'),0{f)). However, sinee {u{t"),0{t”)) G V{t'), it follows 
that J{u{t”),0{t”)) > J{u{f + l),0{t' + 1)), whieh leads to a eontradietion. Therefore, there 
exists at least one aetive set Xq for whieh there exists at least one loeal minimum (u* , 0* ), 
with 01^ elqH ,^^^oext{Ci)). Q.E.D. 

Now we are in the position to state the general theorem eoneeming the eonvergenee of 
SPCM. 

*^Note that Of £ 0/^ due to the definition of the latter set from eq. 


0 - 
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Theorem 2: Suppose that a data set X = {x* e TZ\ i = 1,..., iV} is given. Let J(u, 6) 
be defined as in eq. ( [TT] ) for m = 1, where (u, 0) e M x CH{X). If T : M x CH{X) 

A4 X CH{X) is the operator eorresponding to SPCM algorithm, then for any (u(0), 0(0)) G 
M. X CH{X) the SPCM eonverges to one of the points of the set Sq that eorresponds to a 
valid aetive set Xg, Zq/ = provided that 0*^ e IqH (fli; ui=oext{Ci)). 

Proof: Following a reasoning similar to that of lemma 8 we have that the regions of the 
whole spaee that are aeeessible by T will eventually be eovered by regions V{t') defined as 
in the proof of lemma 8. Then the algorithm 

(i) either will visit a valley in Ui x Iq around a (striet) loeal minimum (u* , 0* ) of a 
eertain aetive set Xq and, as a eonsequenee of theorem 1 (due to (a) the loeal eonvexity of J 
in Yz^^f, (b) the monotonie deerease of J with T, (e) the eontinuity of T in the eorresponding 
Uj X I and (d) the uniqueness of the minimum in this valley) it will eonverge to it, 

(ii) or it will never visit the valley of sueh a loeal minimum. This means that the algorithm 
starts from a (u(0), 0(0)), whose J(u(0), 0(0)) is less than the values of J at all loeal minima. 
However, this ease ean be rejeeted following exaetly the same reasoning with that in the proof 
of lemma 8. 

Therefore, the algorithm will eonverge to a loeal minimum 0*^ that eorresponds to one of 
the possible aetive sets Xq (with Iq f 0) provided that 0*^ G ft (flj; Ui=oe.xt{Ci)). Q.E.D. 

A. Fulfilling the Assumption 1 

Next, we show how the Assumption 1 requiring that at eaeh iteration of SPCM at least 
one equation /(«*) = 0, i = 1,..., iV for eaeh eluster Cj, j = 1,..., m has two solutions, 
ean always be kept valid. In other words, we show that eaeh eluster has at least one data 
point Xj, i = 1,..., with Ui > 0 at eaeh iteration. To this end, we will prove that (a) 
the Assumption 1 is fulfilled at the initial step of SPCM {base case) and (b) this induetively 
holds also for eaeh subsequent iteration of the algorithm {induction step). 

(a) Base case: Taking into aeeount that the initialization of SPCM is defined by the 
FCM algorithm and in partieular eq. @, it is obvious that initially eaeh eluster Cj with 
representative 6j has at least one data point with ||xj — 0j|p < Foeusing on a eertain 
eluster Cj, let x^ be the elosest to Oj data point, where Oj denotes the initial (FCM) 
estimate of the representative of Cj. Then, in general, ||xg — 6j IP « 7i- Aeeording to 
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Proposition A4 (see Appendix), this data point has Uqj > 0, if A' < where 

here /i^ = (<< 1). In order to fulfill the Assumption 1 for eaeh eluster, K should 

be ehosen sueh that K < min 

j=l,...^7n L 


. Also, it is min 


Mj)(l P) 


> 


we reeall that 7 = min 7 ,. Thus, if K is chosen 


2pg(2 Mmaa:)(l P) ^ pmax)(^ P) ^ WhCrC 

SO that K < = B{p), where p^nax = max Pj{« 1), the Assumption 1 is 

satisfied. Note also that B{p) < thus the condition of Proposition A1 is valid. 

In Fig. 1^ the upper bound B{p) of K is illustrated with respect to parameter p for different 
values of pmax, so that each initial cluster has at least one data point with m > 0. Note that 
K = 0.9 is an appropriate value for p = 0.5 that ensures that the Assumption 1 is fulfilled 
at the initial step of SPCM (this is the choice made for K in [flbl f. 



Fig. 5: The upper bound i?(p) of K with respect to parameter p for different values of 
Prnax, SO that sach initial cluster has at least one data point with m > 0. 


(b) Induction step: Let us focus on a specific cluster C Assume that at iteration t, its 
represenative is 0{t) and it has a certain set of active points with its correspond¬ 

ing nonempty intersection of hyperspheres, denoted by B. Obviously, it is CH{X^) C 
{^i:ui>oint{Ci)). Taking into consideration that all possible positions of 6(t + 1) lie inside 

'*For notational convenience, we drop the cluster index j for the rest of this subsection. 

'^We drop the index q, in order to lighten the notation. Index t shows the time dependence of the active set corresponding 
to C, as it evolves in time. 
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CH{X^), we have that 0{t + 1) will lie inside As a eonsequenee, there exists 

at least one data point of X* that will remain aetive at the next iteration of the algorithm. 

As a result, eaeh eluster will have at least one data point Xj, i = 1,..., X with Wj > 0 at 
eaeh iteration of SPCM. 


IV. On the convergence of the PCM 2 algorithm 

In ffm it is proved that the sequenee produeed by PCM 2 terminates to (i) 

either a loeal minimum or a saddle point of J, or (ii) every eonvergent subsequenee of the 
above sequenee terminates to a loeal minimum or a saddle point of J. This result follows as 
a direet applieation of the Zangwill’s eonvergenee theorem (' [[TOl l. However, viewing PCM 2 
as a speeial ease of SPCM, we ean utilize the eonvergenee results of the latter to establish 
stronger results for PCM 2 , eompared to those given in IfTTI . 

Let us be more speeifie. We foeus again to a single 6 and its eorresponding u = [mi, ..., uj^Y 
veetor. Note that Jpcm 2 results direetly from Jspcm, for A = 0. In this ease, the radius R 
(eq. Q) beeomes infinite for any (finite) value of p. This means that the eonvex hull of X, 
CH{X), lies entirely in the interseetion of the hyperspheres eentered at the data points of X. 
As a eonsequenee, m, > 0, for z = 1,..., X. This implies that the whole X is the aetive set. 
Also, note that for A = 0, f{ui) = 0 gives a single positive solution, i.e. Ui = exp(—). 

Let us define the solution set S for PCM 2 as 

Spcm 2 = {(u,0) e [0, 1]"^ X CH{X) : VJ|(u, 0 ) = 0} 


The requirements for (i) the deereasing of JpcMz^ (ii) the eontinuity of Tpcm 2 (the operator 
that eorresponds to PCM 2 , defined in a fashion similar to T) and (iii) the boundness of 
the sequenee produeed by PCM 2 ean be viewed as speeial eases of Lemmas 3, 5 and 6 , 
respeetively, where Uj x I is replaeed by [0, 1] ^ X CH{X) Then Theorem A1 (see 
Appendix) guarantees that there exist fixed points for Tpcjvh and lemma 7 proves that these 
are striet loeal minima of JpcM 2 0 Finally, in eorrespondanee with SPCM, the following 
theorem ean be established for PCM 2 . 


**The only slight difference compared to SPCM concerns the establishment of requirement (i). Specifically, in the proof 
of Lemma 1 in (eq. it turns out that for PCM2, it is K3 = —00, which still contradicts the fact that Ks is finite. Also, 

in 1 27 1 in the same proof it results that < 0 , which gives also a contradiction. 

'^The only thing that is differentiated in the PCM2 case is that g* = X- As a consequence, e is chosen as e < 
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Theorem 3: Suppose that a data set X = {x* G TZ\ i = is given. Let 

JpcM2(u,0) be defined by eq. Q for m = 1, where (u, 0) e [0, 1]^ x CH{X). If 
TpcM 2 ■ [0, 1]^ X CH{X) —)■ [0, 1]^ X CH{X) is the operator corresponding to the PCM 2 
algorithm, then for any G [0, 1]^ x CH{X), the PCM 2 algorithm converges to a 

fixed point of T (which is a local minimum of JpcM 2 )- 

V. Conclusion 

In this paper, a convergence proof for the recently proposed sparse possibilistic c-means 
(SPCM) algorithm is conducted. The main source of difficulty in the provided SPCM con¬ 
vergence analysis, compared to those given for previous possibilistic algorithms, relies on 
the updating of the degrees of compatibility, which are not given in closed form and are 
computed via a two-branch expression. In the present paper, it is shown that the iterative 
sequence generated by SPCM coverges to a local minimum (fixed point) of its accosiated 
cost function Jspcm- Finally, the above analysis for SPCM has been applied to the case of 
PCM 2 ([|5]|) and gave much stronger convergence results compared to those provided in [011 . 


Appendix 

Proposition Al: If K < pe^^^~'P\ then Rj > 0. 

Proof: Substituting A from eq. ([^ into the definition of R'j from eq. (|^ and after some 
manipulations, we have 


or, since ^ < 1 

7j 


Ri 


1 — p 





R^> 


I3 

1 — p 



Straightforward operations show that the positivity of the quantity in parenthesis is equivalent 
to the hypothesis condition K <pe2(i-p). Q.E.D. 


Proposition A2: It is Ui for m*, m' > 0, i = 1,..., fc. 

Proof: It is 


k k ^,2 k 


k k 


k k 




i=i i=i i=i i=i i=ij=i+i i=i i=ij=i “j 


U4 
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U 


^uf - 2 «.) > 0 ^ 

Ui 


k k 

E E 

2 = 1 ^= 2+1 


UoU, 


UjU, 


l\2 


UiUj 


> 0 


which obviously holds. Q.E.D. 


Proposition A3: Let z* = (u*, 0*) E Sq corresponding to a certain active set Xg. Let also 
Yz,* = Lu X >0 be a set of (u, 6), such that Yu = {u G ; \\d* — 11 < where 

£ < and Ye = {6 : 6 = u G Fu}- Then (a) Y^* is a convex set and (b) J 

is a convex function over Y^*. 


Proof: (a) Since the domain Yu of u is a cartesian product of closed one-dimensional 
intervals, it is convex. In addition, the set Ye is also convex by its definition. Thus Y^* is 
convex. 

(b) We prove that for any z G F, it is z'^H^z' > 0, Vz' G Y. Lollowing a reasoning similar 
to that in Lemma 7, we end up with the following inequality (with corresponds to eq. ([ST])) 


" u? 


> if.ntWe'f - 4^u'||e'||(2£) + {i-py,Y.— = W\ 

i=l 


(53) 


2=1 


2=1 


Note that the factor 2e in the right hand side of the above inequality, results from the fact 
that this is the maximum possible difference between two elements in Ye. The discriminant 


of 0(11^11) is ^ k k ,2 

A = 8[8£2(E<)2-(1-p)7E<E^] (54) 

i=l 1=1 1=1 “j 

Proposition A2 and the choice of e guarantee that A is negative, which implies that z'^H^z' > 
0 and as a consequence J is convex over y^*. Q.E.D. 

Proposition A4: A data point x has m > 0 with respect to a cluster C with representative 
0 and parameter 7 or, equivalently, f{u) = 0 has solution(s), if iY < 2 pg( 2 -/ 4 )(i-p)^ where 


Proof: According to eq. @, a data point x has m > 0 if and only if ||x — ^ 

||x - (^- In - p) /i < (^- In which, using eq. @, gives 

/X < (-In^^ -p) ^ p(l -p) < -ln|4 + 2 - 2 p ^ (2 - p)(l -p) > In ^ 

e(2-r){i-P) > Q.E.D. 

Theorem A1 (Leray-Schauder-Tychonoff Fixed point theorem, e.g. lUVlI k If X C TZ^ is 
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nonempty, convex and compact and if Z : X —)■ X is a continuous function, there exists 
X* G X, such that Z{x*) = x* (fixed point). 
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